For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories. For example, it models the probability of counts for rolling a k sided die n times.

Specification

f (x 1, \dots, x k; n, p 1, \dots, p k) = Pr (X 1 = x 1 and \dots and X k = x k) = ⎧ ⎩ ⎨ ⎪ ⎪ n ! x 1 ! \dots x k ! p x 1 1 \dots p x k k, 0 when \sum k i = 1 x i = n otherwise, = Γ ( \sum i x i + 1 ) \prod i Γ ( x i + 1 ) \prod i = 1 k p x i i .

$\begin{align} f(x_1,\ldots,x_k;n,p_1,\ldots,p_k) & {} = \Pr(X_1 = x_1\mbox{ and }\dots\mbox{ and }X_k = x_k) \\ & {} = \begin{cases} { \displaystyle {n! \over x_1!\cdots x_k!}p_1^{x_1}\cdots p_k^{x_k}}, \quad & \mbox{when } \sum_{i=1}^k x_i=n \\ \\ 0 & \mbox{otherwise,} \end{cases} \\ & {} = \frac{\Gamma(\sum_i x_i + 1)}{\prod_i \Gamma(x_i+1)} \prod_{i=1}^k p_i^{x_i}. \end{align}$
for non-negative integers

x1,…,xk $x_1,\dots, x_k$ .

The last form expressed using the Gamma function shows its resemblance to the Dirichlet Distribution which is its Conjugate Prior.
When k = 2, the multinomial distribution is the binomial distribution.
Categorical distribution, the distribution of each trial; for k = 2, this is the Bernoulli distribution.

Although it's imprecise, in many fields, especially NLP, categorical distribution is often confused with multinomial distribution.

Properties

Expectation

E (X i) = n p i

$\operatorname{E}(X_i) = n p_i$

Covariance matrix

Each diagonal entry is the variance of a binomially distributed random variable, and is therefore

var (X i) = n p i (1 - p i)

$\operatorname{var}(X_i)=np_i(1-p_i)$
The off-diagonal entries are the covariances:

cov (X i, X j) = - n p i p j

$\operatorname{cov}(X_i,X_j)=-np_i p_j$
for i, j distinct.

Reference

Multinomial distribution: https://en.wikipedia.org/wiki/Multinomial_distribution

Multinomial Distribution

Specification

Related Distributions

Properties

Expectation

Covariance matrix

Reference